Mechanical response of random heteropolymers 
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We present an analytical theory for heteropolymer deformation, as exemplified experimentally 
by stretching of single protein molecules. Using a mean-field replica theory, we determine phase 
diagrams for stress-induced unfolding of typical random sequences. This transition is sharp in the 
limit of infinitely long chain molecules. But for chain lengths relevant to biological macromolecules, 
partially unfolded conformations prevail over an intermediate range of stress. These necklace-like 
structures, comprised of alternating compact and extended subunits, are stabilized by quenched 
variations in the composition of finite chain segments. The most stable arrangements of these 
subunits are largely determined by preferential extension of segments rich in solvophilic monomers. 
This predicted significance of necklace structures explains recent observations in protein stretching 
experiments. We examine the statistical features of select sequences that give rise to mechanical 
strength and may thus have guided the evolution of proteins that carry out mechanical functions in 
living cells. 



I. INTRODUCTION 

Several recent experiments have highlighted the me- 
chanical strength of proteins involved in muscle elas- 
ticity and cell adhesion When pulled from the 
ends, these molecules can withstand significant stress be- 
fore their constituent domains unfold from compact na- 
tive states to extended coil-like structures. This stress- 
induced unfolding occurs sharply, with threshold forces 
ft on the order of 100 pN. In natural units for these sys- 
tems, ft ^ lOOfceT/a, where /cb is Boltzmann's constant 
(which we subsequently set to unity), T is temperature, 
and a ~ 1 nm is the size of a typical amino acid. By 
contrast, studies of proteins whose functions are not me- 
chanical in nature have revealed lower threshold forces 
ft ~ IQk^T/a) and less dramatic stretching behavior 
|j. Specifically, relative fluctuations in restoring force 
are considerably larger than for mechanical proteins, and 
the stretching transition is less sharply defined. Evolu- 
tion therefore appears to have designed certain proteins 
to unfold reproducibly under critical stress. 

Results of computer simulations have lent details to 
this notion of mechanical design. Random heteropoly- 
mers on a lattice, which may be viewed as coarse- 
grained caricatures of proteins, exhibit stretching be- 
havior that depends strongly on the sequence of con- 
stituent monomers Typical random sequences elon- 
gate smoothly under stress, passing through one or more 
long-lived, partially extended structures. Sequences se- 
lected for their ability to fold rapidly in the absence of 
stress, however, undergo a relatively sharp force-induced 
transition. Folding efhciency is thus correlated with me- 
chanical strength to some extent. But reaction coordi- 
nates for protein folding are only loosely coupled to the 
simple mechanical variables manipulated in stretching 
experiments [Q. In an ensemble of fast-folding sequences, 
a range of mechanical stabilities is therefore expected, 
due to variations in sequence properties that do not af- 
fect folding dynamics. Native state topology may be one 
such property, according to results of protein stretching 



simulations with atomistic detail But due to the 

computational expense of such simulations, they provide 
only anecdotal insight into the relationship between se- 
quence and mechanical strength. 

A general understanding of heteropolymer deformation 
can only be obtained by considering the ensemble of pos- 
sible sequences. We have recently described the results 
of an analytical theory that takes the diversity of this 
ensemble properly into account pO) . Specifically, the 
free energies of various conformational states are aver- 
aged over the distribution of sequences, using the replica 
trick pT[ . In this way, the mechanical response typical 
of random heteropolymers is determined. This article 
presents our theoretical approach in detail. 

Because we focus on equilibrium free energetics, our 
theory directly applies only to reversible stretching, i.e., 
pulling rates that are much slower than rates of sponta- 
neous unfolding. Stretching experiments, on the other 
hand, have been performed irreversibly, as evidenced 
by wide hysteresis Q. Relating theoretical results to 
these experiments is made possible by an ide ntity for 
nonequilibrium dynamics obtained by Jarzynski |12[| and 
by Crooks p^ . In particular. Hummer and Szabo have 
shown that reversible stretching behavior may be ex- 
tracted from repeated nonequilibrium measurements . 
Equilibrium results determined in this way differ only in 
details from their nonequilibrium counterparts. For ex- 
ample, the threshold force required to unfold a mechan- 
ical protein reversibly is smaller than the corresponding 
nonequilibrium value, but the induced transition remains 
sharp . Qualitative predictions of our theory are thus 
relevant to current experiments. Direct comparisons are 
possible in principle when experimental measurements 
have been repeated sufRciently. 

The coarse features of heteropolymer response do not 
differ significantly from those of a homopolymer. In poor 
solvent (T < 0), a chain molecule is transformed by stress 
from collapsed globule to expanded coil. This transfor- 
mation occurs abruptly for homopolymers, as determined 
by scaling analysis p5|. Globule deformation is strongly 
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resisted by the cost of enlarging the polymer-solvent in- 
terface, while a coil is quite pliable. The "phase transi- 
tion" between these states is thus accompanied by a sharp 
change in extension. At phase coexistence, the free ener- 
getic equivalence of globule and coil gives rise to necklace- 
like structures, in which compact and expanded subunits 
alternate within the chain (as sketched in Fig. |l|) . In the 
case of honiopolymers, these partially extended struc- 
tures are unstable away from coexistence. A phase di- 
agram for homopolymer stretching is constructed from 
this physical picture in Sec. II. 

Necklace structures figure more prominently in the 
cases of polyelectrolytes and polyampholytes. When 
a significant fraction of monomers carry charge, fully 
compact conformations are unstable, in analogy to the 
Rayleigh instability of charged droplets As a re- 

sult, the chain segregates into a series of smaller, teth- 
ered globules. Such necklaces are the ground states of 
sufficiently charged polyelectrolytes, even in the absence 
of stress. Stretching these molecules modifies the the 
details of structural partitioning, reducing the sizes and 
numbers of compact subunits and lengthening the string- 
like subunits that connect them [pT 18|. 

Necklaces play an enhanced role in heteropolymer 
stretching as well, although for different reasons. The 
quenched disorder of random sequences lends different 
degrees of mechanical susceptibility to different regions 
of the chain. Depending on the extent of this heterogene- 
ity, necklace structures may dominate over an apprecia- 
ble range of stress. In effect, the globule-coil coexistence 
region is broadened by disorder. In Sec. Ill, we determine 
the magnitude of this broadening by analyzing a micro- 
scopic model for heteropolymer deformation. For uncor- 
related sequence statistics, we show that necklaces are 
stabilized over a stress interval of relative width N~-^/^, 
where N is the number of monomers per molecule. The 
effects of correlations within a sequence, also examined 
in Sec. Ill, suggest that certain statistical patterns are 
strongly correlated with mechanical strength. These pat- 
terns are consistent with the structures of mechanical 
proteins designed by evolution. 

In seeking a microscopic explanation for the stretch- 
ing behavior of proteins, we focus on the effects of het- 
eropolymeric disorder. Electrostatic effects are expected 
to infiuence protein stability less strongly at physiological 
conditions [ p9|p0| . We also focus on the response to ex- 
ternal stress, rather than to strain. Although the exten- 
sion of protein molecules is constrained in experiments, 
the flexibility of unfolded segments in these modular 
structures mediates the applied strain. Indeed, provided 
the contour lengths of unfolded regions, simple elastic 
models account for the measured restoring forces of mod- 
ular proteins. Individual, folded domains are thus effec- 
tively subjected to uniform external stress. The model we 
analyze in Sec. Ill is tailored to these external conditions 
appropriate for experiments and for biological function. 



FIG. 1. Schematic example of a necklace-like polymeric 
structure. Some regions of the chain adopt locally com- 
pact, globular conformations, while others exist in extended 
coil-like states. 



II. HOMOPOLYMER STRETCHING 

We employ simple, mean-field descriptions of the con- 
formational states relevant to polymer deformation. For 
instance, the free energy of a homopolymer globule rela- 
tive to that of an ideal coil, 



(1) 



is dominated by the effective interactions between 
monomers. Here, p is the monomer density, and 7 is 
the surface tension of the globule-solvent interface. The 
energy density of monomer attractions, Bq = T — 9, sta- 
bilizes the compact state for T < 9. We focus on tem- 
peratures below the 0-region, for which the globule is 
highly compact, i.e., pv ^ 1, where v is the volume of a 
monomer. At this level of description, the contribution 
of stress to Eq. g is negligible within the regime of globule 
stability. 

We represent the coil state by a freely jointed chain 
with segment length a. This model provides the sim- 
plest description of polymer flexibility that ensures a fi- 
nite maximum chain extension, Na. This condition is 
important at low temperatures, where the coil is highly 
extended under stress. The free energy of coil deforma- 
tion is easily computed from this model 12^ : 



T,{N)=-NT\ii [y{fa/T)], 



(2) 



where y{x) = sinh(x)/a;. The stretching force at which 
this extended coil coexists with the compact state is de- 
termined by equating and Tg-. 



ft 



(3) 



This phase boundary is plotted as a function of tempera- 
ture in Fig. H for various N. Qualitative features of these 
phase diagrams for homopolymer stretching compare well 
with results of lattice polymer simulations At low 
temperatures, a reentrant coil phase appears. This rod- 
like state involves small fluctuations about a fully ex- 
tended structure, as has been noted in simulation work 
p2t . Similar reentrance has been predicted for the me- 
chanical unzipping of DNA at low temperatures l23,PJ| . 
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FIG. 2. Phase diagram for homopolymer stretching com- 
puted from Eq. ^ The boundary separating globule and 
coil states is plotted for chain lengths A'^ = 100 (dot-dashed 
curve), A'^ = 1000 (dashed curve), and for the infinite chain 
limit (solid curve). Here, we have taken the surface free en- 
ergy density of the globule to be comparable to the energy 
density of monomer interactions, 7 « 



III. HETEROPOLYMER STRETCHING 

Heterogeneity of monomer types has several conse- 
quences for the deformation scenario described above. 
First, the fully compact state is dominated by only a 
few distinct conformations at low temperature. The cor- 
responding freezing transition has been analyzed thor- 
oughly 26|. For necklace structures, each compact 
subunit can freeze in this way. Because the composition 
of these subunits is randomly distributed, the details of 
freezing will differ for each. Specifically, each subunit 
will have a dilferent ground state energy, and thus a dif- 
ferent stability. In addition, variations in sequence com- 
position will strongly influence the solvation energetics 
of expanded subunits. As a result, the susceptibility of 
a given region of the chain to extension is effectively a 
random variable. This situation is illustrated in Fig. ^. 

We weigh these effects of disorder using a microscopic 
model of random heteropolymers. For a particular real- 
ization of monomer identities, cr^, the energy of a chain 
conformation is 



H-Ho + r^tT, -f • (r^-n). 



(4) 



N 



Here, denotes the position of the ith monomer in the 
chain, and f is the external stretching force coupled to 
the end-to-end vector, rjq — v\. The connectivity of con- 
secutive monomers is implicit in Eq. |[ We take the links 



FIG. 3. Random potential u(a;) representing the depen- 
dence of necklace energy on the sequence location a; of a glob- 
ular subunit. Filled and unfilled circles denote monomers of 
two diff'erent types. Because the local composition of the se- 
quence differs at X\, X2, and x-i, globular regions at these 
locations will have different ground state energies. More im- 
portantly, the composition of exposed coil regions differs for 
the three cases. For a random sequence of monomers, the 
total necklace energy is thus a random variable for different 
subunit arrangements. The statistics of these variations de- 
termine the size of random fluctuations, Au. 



between connected monomers i and i + 1 to be distributed 
according to a function gdri+i — rj|) with range a. Un- 
connected monomers i and j interact only when they are 
in contact, as described by the Dirac delta function in 
Eq. ^ The heteropolymeric part of this interaction de- 
pends on the identities of the monomers involved. Two 
monomer types, ct; — ±1, are possible at each point in 
the sequence. These types could represent, for exam- 
ple, amino acids with hydrophilic and hydrophobic side 
chains. We choose % < 0, so that each monomer attracts 
others of the same type most strongly. The sequence 
of monomer types Ui is fixed for each realization of the 
heteropolymer. As in related problems, this quenched 
disorder requires careful treatment. 

The second term in Eq. ^ describes the solvation ener- 
getics of monomers that are exposed to the external envi- 
ronment. The sum extends only over the set S of exposed 
monomers. Depending implicitly on chain conformation, 
this term mimics a many-body aspect of the hydrophobic 
effect, the tendency to bury unfavorably solvated regions 
of a solute. We take F > 0, so that monomers of type 
(Ti = — 1 are preferentially solvated, In the following sec- 
tions, we analyze the energetics of Eqs. ^ and || for the 
conformational states important to mechanical response. 
With these results, we construct phase diagrams for het- 
eropolymer stretching. 
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A. Globule 

The first term in Eq. ^ Ho, defines a model of a ran- 
dom copolymer in the absence of explicit solvation en- 
ergetics (r = 0) and stretching force (f = 0). A mean 
field theory for the globular state of this model was an- 
alyzed in Ref. using the replica trick. At a critical 
temperature, Tc, the heteropolymer freezes into 0(1) low 
energy conformations. This transition is accompanied by 
one-step replica symmetry breaking, indicating that the 
hierarchy of basins of attraction in conformation space 
is one level deep. This result is consistent with a ran- 
dom energy model of the system. In other words, the 
density of states is well represented by drawing energies 
at random from a Gaussian distribution, P{E), with ap- 
propriate mean, and variance, A. Our analysis of 
the globular state for the model defined by Eq. (with 
nonzero T and f) closely follows that of Rcf. Eq]. We 
focus on the way in which solvation modifies the effective 
distribution of energies. As in the homopolymeric case, 
we neglect the effect of stretching force on globule free 
energetics. 

In order to compute properties characteristic of an 
ensemble of random heteropolymers, we must average 
over their sequences. Because the disordered sequence 
is quenched, the average is correctly performed on the 
logarithm of the partition function, Z , rather than on Z 
itself. This mathematically awkward procedure can be 
accomplished using the replica trick pd| ] , 



(InZ)av^lim ^^"^- \ 

n^O n 



(6) 



where (. . .)av denotes an average over realizations of the 
random sequence. For integer values of n, the quantity 
{Z'^)iiv has the form of a partition function for n coupled 
replicas of the original system. For the model of Eq. 0, 
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a = l j=l 



X exp 



X exp 



Bo 
T 



N 



a—1 — l 



(7) 



where rf is the position of the ith monomer of replica 
number a. In order to perform the average in Eq. |7| we 
first rewrite the right hand side as 



( / Vr^9irf+^ - rf) exp [-|? ^ E ^^^'^ " 

V a=li,j = l 

X exp [bJ2j dRj2cr,S{rf - R) ^ (7jS{r'^ - R) 

a i j 

+c^/'dRp,.„(R)^a,J(r,"-R)]\ , (8) 

« i / av 

where b = — x/T, c = —T/T, and / I?r" denotes an 
integration over the monomer positions of all replicas. 
We have also defined 



p.,<,(R) = ^^(R-rn 
ies 



(9) 



as the density of exposed monomers at position R, i.e., 
the spatial pattern formed by the globule boundary. 
Here, d{r) is 1 if r = 0, and vanishes otherwise. We 
perform a Hubbard-Stratonovich transformation with re- 
spect to the field 



^a,<5(rf-R), 



(10) 



by introducing a conjugate field ipaCR)- With this trans- 
formation, the second exponential in Eq. @ becomes 



rfRV'a(R)Ps,a(R) 

a 



X exp 



X exp 



(11) 



The average over sequence realizations may now be 
easily performed, yielding an action that is highly non- 
linear in ■0q(R). As was done in Ref. p6| , we retain 
only the first term in a high-temperature expansion of 
the nonlinearity. As in that work, inclusion of additional 
terms does not change the leading order behavior of rel- 
evant order parameters. This simplification corresponds 
to treating monomer types as Gaussian, rather than bi- 
nary, variables, with distribution 



(12) 



The variety of interaction strengths described by Eq. |lj 
might be appropriate for monomers that come into con- 
tact with a variety of relative orientations. It could also 
describe a heteropolymer with more than two possible 
monomer types. 

Averaging over the effective distribution of monomer 
identities, and scaling the field V'a(R) by 2&, we obtain 
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/exp 



X y X>V„(R)exp -6 y dR-0^(R) 

+26^'^ y" dRldR2V'a(Rl)V'/3(R2)Oa/3(Rl,R2) 



X exp 



(13) 



th 



where (. . .)th denotes a thermal average over the statistics 
of monomer links imposed by giyf^x — r"). In Eq. |l^, we 
have additionally introduced two fields: a single-replica 
density field, /3a(R = X^i ^{"^i ~ R)i ^"^^ ^ ^AA^ describing 
the conformational similarity of two replicas, 



,0(Ri, R2) = ^^^'^ - - 1^2). 



(14) 



Since density fluctuations are negligible in the globu- 
lar state, P(j(R) may be approximately replaced by its 
mean value, p ~ where v is the volume of a typical 
monomer. Similarly, the surface density, /9s(R)i is es- 
sentially fixed in the globular state. The replica overlap 
function, (5a/3(Ri, R2), however, is an important mea- 
sure of the population of different conformational basins 
of attraction. It is thus a useful order parameter to de- 
scribe freezing of random heteropolymers into their low- 
est energy conformations. 

Because density is virtually constant within the glob- 
ule, QoLfi is a function of Ri — R2 only, and its Fourier 
transform depends on a single wavevector k: 



g„^(k) = y d(Ri - R2)Qa^(Ri, R2) exp [ik • (Ri - R2)] 

(15) 

With a Fourier representation of (R), the right hand 
side of Eq. 13 becomes 



PV-'ct (k) exp 



^EE^"/3(k)v;a(k)^M-k) 

Q,/3 k 



where 



+^^EE^^w^„(-k) 

a k J / th 



P„/3(k) = -U^p + 2/x2&2Q^^(k). 



(16) 



(17) 



In Eq. |T| we have omitted the first exponential of Eq. |l^, 
which contributes an irrelevant multiplicative constant. 

Following the analysis of Ref. , we use the replica 
overlap function as the order parameter for a mean field 



theory of heteropolymer freezing. To this end, we rewrite 
Eq. |l^ as a functional integral over possible realizations 
of Qq/j: 



PQ„^(k) exp h£;{Q„^(k)} + 5{Q„^(k)}] 

(18) 

Here, E is the effective energy of a particular realization: 
£;{Q„^(k)} = In /i?v«(R) 

X exp 



^EE^"/'Wv^"W^/3(-k) 



a,/3 k 



+^^EE'5s(k)4(-k) 



a k 



(ik 



lndetP„^(k) - - ^ |p,(k)|2p-^i(k) 

a,/3 



(19) 
(20) 



Similarly, S is an effective entropy describing the number 
of conformations consistent with a particular realization 
of Qq^: 



5{Q„0(k)} = 




a/3(k) 



F^exp[zk.(rr-rf)]j\ . (21 

i / / th 

We will approximate the integral in Eq. |l^ by optimizing 
the free energy with respect to replica overlap. 

We imagine that a hierarchy exists for basins of at- 
traction in conformation space, as is done in the theory 
of spin glasses ||ll|]. It is then natural to sort rephcas 
into groups, such that replicas belonging to the same 
group overlap most strongly This grouping determines 
the structure of Qq/j. If a and (i are in the same group, 
(5q/3(Ri,R2) is nearly p(5(Ri — R2). If a and /3 belong 
to widely different groups, (5q/3(Ri,R2) ~ 0. The one- 
step replica symmetry breaking demonstrated in Ref. p^ ] 
is not altered by the solvation energetics in Eq. ^, be- 
cause the scaling properties of Qa^ are unchanged. Con- 
sequently, overlap between replicas is binary: 



QQ/3(k) 



p, for a, /3 in the same group, 
0, for a,/3 in different groups. 



(22) 



Note that Qa/3(k) is independent of k, since replica over- 
lap is either absent or microscopically complete. To- 
gether with the number of replicas in each group, Eq. |2^ 
specifies (3c(/3(k) completely. 

The limit n — s- in Eq. ^ is most conveniently taken 
using a continuous representation of the replica overlap 
matrix. In this limit, the "number" of replicas in a group, 
xo, lies between and 1, and summations over replica 
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indices are replaced by integrations on the interval [0, 1]. 
The first term in Eq. ^ was computed in Ref. |2^ ] using 
the continuous form of Qap introduced by Parisi: 



In [det Pap] = In 6 



ln(l - 7x0) 



(23) 



where 7 = 2bfi^p. We evaluate the second term in Eq. |2^ 
using identities derived in Ref. for the Parisi matrix: 



E 

Q,/3 



p,(k)pp-;(k 



(1 - JXo) 



A, 



(24) 



where A is the surface area of the globule. The loss of 
entropy due to the grouping of replicas described by Qap 
was also computed in Ref. as S — NuS/xq. Here, 
s = Ina^/z) is the entropy loss per monomer of constrain- 
ing a replica to correspond to other replicas in its group at 
a microscopic level. Combining these results, and noting 
that wavevector summations contribute only unimpor- 
tant factors of volume, we obtain the replica free energy 
density: 



nN 



= ln6- 



In (1-7x0) \-iA s 

-77-1- 72^0 ^ • 

46 A* Xo 



Xq 



(25) 



Eq. |25| differs from the corresponding result in Ref. |2g] 
only by the term proportional to A/N ^ N~^^^. 

We now employ a mean field approximation by opti- 
mizing the free energy density with respect to xq- (Be- 
cause the number of pairs of replicas is negative in the 
limit n — > 0, the appropriate extremum is in fact a maxi- 
mum of F{xo).) To lowest nonvanishing order in xq, the 
mean field solution is 



Xq 




2s 



c^A/2b-fN 



(26) 



From Eq. ^6], we may identify the transition temperature, 
Tc, may be identified at which freezing occurs, i.e., at 
which Xq first deviates from unity: 

T, = (2s)-i/2(-2xmV - + 0{N-'/'). (27) 

Comparing this result with Eq. 3.9 of Ref. we find 
that the solvation term in Eq. raises by an amount 

Together with the average energy of non-native confor- 
mations, the freezing temperature in Eq. |2^ determines 
the parameters of a random energy model corresponding 
to the random heteropolymer. The effective distribution 
of energies is given by 



P{E) = (27r)i/2 exp [-{E - E f/NA^ 



(28) 



where A — \'2sTc and E — BopN. This distribution 
is dominated by states in the interval E — N^^^A < 



E 





r7v2/3 



FIG. 4. Schematic effect of solvation on the distribution 
of globule energies. Low-lying energy levels, as well as the es- 
sentially continuous part of the energy spectrum, are sketched 
at left for a heteropolymer without differential solvation of 
monomer types. Surface energetics reorder these levels on a 
scale FA'^^''"^ (as shown at right). The basic form of the dis- 
tribution is unchanged. 

E < E + N^/^A. At energies just below a critical 
value, E* = E — NAs^^"^, the number of states is 
0(1), while just above E* the number is exponentially 
large. The ground states of particular random sequences 
are distributed narrowly about E* |^^. Solvation thus 
lowers the average ground state energy by an amount 
{T^p/4\x\)s'/^A. 

Solvation of the globule surface selects a ground state 
from the set of conformations with monomer interaction 
energies E < E* + TN'^^^ . This selection is illustrated in 
Fig. 0. If the energy scale of solvation is small, |r| <^ |x|, 
the shift in ground state energy will be a negligible frac- 
tion of the optimum surface energy, TpA. In this case, 
the set of low-energy conformations from which to select 
is small, and it is unlikely that one of these conforma- 
tions presents a predominantly solvophilic surface. If, on 
the other hand, |r| « solvation can be an impor- 
tant factor in determining the ground state. In this case, 
there is a reasonable probability that a conformation with 
E < E* + rA^2/3 has favorable solvation energy. Here, 
the shift in ground state energy will be comparable to 
TpA, and the surface of the native state will be largely 
solvophilic. This solvation effect does not strongly in- 
fluence the freezing behavior studied in Ref. |26|. But 
it does represent the energetic contribution most sensi- 
tive to variations in sequence composition. It is therefore 
significant for our analysis of necklace structures. 



B. Coil 

For the coil state of a heteropolymer, we must add sol- 
vation energetics to the free energy in Eq. ^. For simplic- 
ity, we consider only random sequences whose total com- 
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position is fixed by <Ti = 0. Since nearly all monomers 
are exposed to solvent in the expanded coil state, this 
constraint causes the solvation energy in Eq. ^ to van- 
ish. Thus, the free energy of this heteropolymeric coil is 
identical to that of the homopolymeric coil discussed in 
Sec. II. In the next section, we consider necklace struc- 
tures of a random heteropolymer. In these structures, 
short segments of the chain exist in coil-like states. The 
sequence composition of these exposed segments is not 
constrained, and the solvation energy need not vanish. 
We will show this to be an important stabilizing effect 
for necklace structures. 



C. Necklace structures 

For a random heteropolymer, the numbers and sizes of 
globular subunits do not provide an adequate description 
of a necklace-like structure. Due to the heterogeneity of 
monomer interactions, the energy of such a structure de- 
pends as well on the arrangement of subunits within the 
sequence. This dependence on globule location, a;, can be 
represented as the effect of an external, random potential 
u{x) with fluctuations of size Au, as illustrated in Fig. ||. 
If Au is large, it is likely that a particular arrangement 
lies much lower in energy than others. Over some range 
of stretching forces, it is possible that such low-energy 
necklace structures are preferred over pure globule and 
coil states. In this case, necklaces will play a signifi- 
cant role in the response to stretching. Our analysis 
of heteropolymeric necklaces is guided by this perspec- 
tive. First, we characterize the statistics of the effective 
random potential. We then identify situations in which 
quenched disorder stabilizes necklace structures signifi- 
cantly. 

Although we take sequence composition to be fixed for 
the chain as a whole, the composition q of local regions 
is distributed according to 



p{q) cx exp 



2^ N' 



q = 



- y 



(29) 
(30) 



i(z segment 



where M is the length of the segment. Because differ- 
ent segments of the sequence have different compositions, 
their local free energetics will vary. In a region with com- 
position g, the apparent distribution of monomer types 
is modified from Eq. [12, 



w{(Ji;q) (X exp [-[ai - qf - q^)i? 



(31) 



These modified sequence statistics alter the local distri- 
bution of energies, for both globule and coil subunits. 

In the context of the random energy model discussed in 
Sec. III. A, the variation in local sequence statistics mod- 
ulates the mean and variance of conformational energies. 



In effect, each segment of the sequence has a different as- 
sociated random energy model. A globular subunit will 
therefore have a different ground state energy for each 
location in the sequence. Specifically, the local value of 
q shifts the mean energy by an amount xg^pM -I- TqpA, 
and reduces the variance of monomer identities, /i, by a 
factor ^/\ — cp. As a result, the characteristic ground 
state energy in a sequence region with composition q is 



E* = BapM + xq^pM + TqpA 



+XP^\l-q^)pM + 0{M''^). (32) 

Variations in E* along the sequence contribute to the 
random potential u{x). The magnitude of this contribu- 
tion is computed by averaging variations in E* over the 
distribution of q in Eq. E9f 



{{5E*f 



TfipM 



1/6 



(33) 



Fluctuations in u{x) due to energetics of a globular region 
of length M thus arise from solvation at leading order, 
and grow rather slowly with increasing M . 

Variations in the solvation of coil regions are more siz- 
able. In a region of length M and composition q, the 
solvation energy is TqM. Fluctuations of this energy are 
of size r^M^/^(l — M/A^)~^/^. These variations are con- 
siderably larger than those of globule energetics for large 
M, and set the scale of Au. We show below that this sol- 
vation effect is sufficient to stabilize necklace structures 
for long but finite chains. 

We focus on necklace structures including m globular 
regions, each with M monomers. Thermodynamics of 
this class of structures may be approximated by draw- 
ing Q values at random from a Gaussian distribution 
with variance Au. Here, f2 is the number of statistically 
independent arrangements of the globular regions, Q « 
[M-"'{N-M+1){N-2M+1) . . . {N-mM+l)/m\]. The 
thermodynamics of large systems with independently dis- 
tributed random energies are well known |^,^ . In this 
case, the free energy of spontaneous fluctuations in the 
random potential u{x) is given by 



^rand(M, "^) 



-Inn T[l 



{TdTf 



T>Te 
T<Tt. 



(34) 



In Eq. H, ^ Tp{mMf/^{l - mM/Nf'^ /{2\nQ.Y/^ 
is the temperature at which globular regions become lo- 
calized in the sequence. For T < Ti, the necklace has a 
frozen arrangement of subunits, and does not reorganize. 
Combining Eq. ^ with Eqs. |l| and ||, we obtain the total 
free energy of a necklace structure, 

J^„cek(M, m) = mTg{M) + T^{N ~ mM) + J^rand(M, m). 

(35) 

The pure globule and coil states of a heteropolymer are 
also described by Eq. ^ for M = and M = 0, respec- 
tively. 
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(a) 




,0 100.0 



FIG. 5. Distribution of necklace structures, p{M), for the 
thermodynamic state T = 0.56^, / = 0.9561/0 with = 100. 
Necklaces consisting of a single large globule {M = 0(N)) 
and short coil regions are most probable at these condi- 
tions. But necklaces with many small globules (M <g A'^) 
are also strongly represented. Globules of intermediate size 
{M « N /2) occur with a small but nonnegligible probability. 
A minimum size of 5 monomers is chosen for globular regions. 



The scaling of Eq. |3J suggests that the distribution of 
globular region sizes may be quite broad. For a neck- 
lace with small globules {M <C N), ^rand is optimized 
with a large number of regions, m = 0{N). In this case, 
•^rand ^ N^^^. Foi large globules, the constraint that 
mM < N requires that m = In this case as well, 

^rand ~ A^^^^. Surfacc cffccts will yield a preference for 
a small number of large globular regions, but this scaling 
indicates that a broad ensemble of globule sizes may be 
important at a single thermodynamic state. This possi- 
bility is demonstrated in Fig. 0, in which the distribution 
of globule sizes, p{M) oc exp [- •^ncck(M, m)/r], is 
plotted as a function of M for a thermodynamic state 
near the globule-coil transition. Here, primed sums are 
restricted to m < N/M. The weight of necklaces con- 
sisting of a single large globule is comparable to that of 
necklaces comprised of many small globules. As a result, 
fluctuations in polymer extension are considerable near 
the transition. 

We determine a phase diagram for stretching of ran- 
dom heteropolymers by computing the total fraction of 
monomers belonging to globular regions, 0g: 



J2m Sm "^^'^ exp [-Tneckjm, M)/T] 
EAfE:„exph^„eck(m,M)/T] 



(36) 



When > 0.05, we consider the heteropolymer to be 
in a globular state. Similarly, when (/ig < 0.95, we con- 
sider it to be in a coil state. In the intermediate regime, 
0.05 < 0g < 0.95, necklace structures are prevalent. Re- 
sults are plotted in Fig. |[ For N — 100, the extent 
of the necklace phase prevents the possibility of a sharp 
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FIG. 6. Phase diagrams for random heteropolymer 
stretching computed from Eq. Shaded regions mark a 
necklace phase, in which the fraction of monomers belonging 
to globular regions lies between 0.05 and 0.95. For these cal- 
culations, we have chosen the scale of monomer solvation ener- 
getics to coincide with that of interactions between monomers, 
T K, 6. We have also chosen the variance of monomer identi- 
ties, /i, to be unity. 



transition from globule to coil. This result is consistent 
with simulated stretching of a short {N = 27) random 
heteropolymer, suggesting that intermediates states de- 
scribed in Ref. should be identified with the necklace 
structures we consider here. But the stabilization of neck- 
laces arising from JFiand scales only as N^^"^, and is over- 
come for long chains by 0{N) contributions of the first 
The range of stretching forces over 



two terms in Eq. 34 



which necklaces are stable is therefore significantly dimin- 
ished for N — 1000, and vanishes as iV ^ oo. Stretch- 
ing behavior of infinitely long random heteropolymers 
is indistinguishable from that of homopolymers. But for 
chain lengths relevant to macromolecules, necklace struc- 
tures can play an important role. The large fluctuations 
in extension accompanying these structures explains the 
observed stretching behavior of proteins like barnase |^ 
that are not designed for mechanical functions. 
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D. Sequence statistics 

In the previous section we showed that the distribu- 
tion of local sequence compositions plays an important 
role in the stretching of heteropolymers with uncorrelated 
random sequences. Introducing correlations into the se- 
quence statistics will alter this local composition, and 
may therefore affect stretching behavior strongly. Here 
we consider three simple, prototypical forms of correlated 
statistics. 

When monomers of the same type are likely to be 
found within a correlation "length" in the sequence, 
clusters of like monomers occur with high probability. 
The weight of necklace structures is clearly enhanced for 
such blocky sequences, since regions of the chain with 
unfavorable solvation energy may be almost completely 
shielded from the solvent. Specifically, the distribution 
of local sequence compositions is nearly binary and inde- 
pendent of region size M for Af < ^: 

p{q)^l{S,,i+S,,^^). (37) 

As a result, fluctuations in solvation energy of exposed 
coil regions are of size F^. If ^ '--^ A^, the free energy stabi- 
lizing necklace structures is also macroscopic, J^rand ~ N. 
With this macroscopic stabilization, the necklace phase 
will be stable over a finite range of / even as iV — > c». 

When correlations in monomer type decay alge- 
braically, rather than exponentially, similarly dramatic 
fluctuations in local composition are possible. Specif- 
ically, power law correlations with decay exponent i] 
yield {{5q)'^)q ~ . Because the sequence is one- 

dimensional, fluctuations are enhanced only for 77 < 1 At 
the crossover (77 = 1), {{SqY)q ^ InM, providing a 
weak stabilization of necklaces. But as ^ 0, clusters 
of like monomers may be arbitrarily large. In this limit, 
necklace stabilization again becomes macroscopic. 

Anticorrelations between like monomers, on the other 
hand, tends to destabilize necklace structures. In this 
case, the local composition is essentially "neutral" (i.e., 
g « 0) for regions larger than the scale of anticorrela- 
tions, ^. For the statistics of neutrality fluctuations in 
Coulombic systems, {{Sq)'^)q ~ A/^^/^, fluctuations in 
the effective random potential, Au = 0(1), are especially 
small. On scales larger than ^, the chain is effectively a 
homopolymer. Consequently, necklace structures are not 
sufhciently stable to appear away from globule-coil coex- 
istence. 

These results suggest some basic principles for design- 
ing mechanically robust heteropolymers. Typical ran- 
dom sequences are poor candidates, since they tend to 
form partially unfolded necklace structures under strain. 
Sequences in which solvophobic groups are heavily clus- 
tered together will typically also permit stable ensembles 
of necklace structures. Most promising are sequences 
whose correlations suppress fluctuations in local com- 
position. These may represent, for example, molecules 



with widely distributed hydrophobic groups. The com- 
pact native structures of such molecules generally include 
important contacts linking distant segments of the chain. 
It is in part this topology imposed by nonlocal contacts 
that provides a collective resistance to strain. Indeed, 
a common structural motif of mechanical proteins, /3- 
sheet secondary structure, is typically rich in nonlocal 
hydrophobic contacts. The elements of sequence statis- 
tics we predict to be favorable for mechanical strength 
are thus related to topological features of the native state 
suggested by computer simulations |^,^ . It will be inter- 
esting to see how these basic principles compare with sim- 
ulations of evolutionary design for mechanical strength. 
For commonly used, coarse-grained models of proteins, 
such simulations should be feasible using current com- 
putational resources and are currently underway in our 
laboratory. 
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